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Abstract 

Differences between plant genomes range from single nucleotide polymorphisms to large-scale duplications, 
deletions and rearrangements. The large polymorphisms are termed structural variants (SVs). SVs have received sig- 
nificant attention in human genetics and were found to be responsible for various chronic diseases. However, little 
effort has been directed towards understanding the role of SVs in plants. Many recent advances in plant genetics 
have resulted from improvements in high-resolution technologies for measuring SVs, including microarray-based 
techniques, and more recently, high-throughput DNA sequencing. In this review we describe recent reports of SV 
in plants and describe the genomic technologies currently used to measure these SVs. 

Keywords: structural variations (SVs); next-generation sequencing (NGS); copy number variations (CNVs); presence and 
absence variations (PAVs); inversions; translocations 



INTRODUCTION 

Plant species frequently possess unique features in 
terms of their habitat, growth and reproduction, 
often owing to differences in their genomes. 
Unlocking the infonnation present within plant gen- 
omes will advance our understanding of some of the 
basic biological phenomena that make individual 
plant species special and may help in the improve- 
ment of agronomic crop species. A central challenge 
in genome studies is to correlate genomic DNA 
variation with observed heritable phenotypes [1]. 
The ability to detect genomic differences between 
individuals is the foundation of these studies, and 
technologies to detect genomic variation have 
advanced significantly in recent years. Plant genome 
variation exists in many forms, and these variations 
can be beneficial, neutral or deleterious to the plant. 
The first differences observed in plant genome com- 
position were mainly in the number and structure of 
chromosomes, observed using microscopy. However, 



during the past two decades, the application of 
molecular genetic markers has dominated this 
experimental landscape [2]. Molecular marker 
technology has advanced from laborious and 
expensive restriction fragment polymorphisms to 
high-throughput sequence bases markers such as 
simple sequence repeats and single nucleotide 
polymorphisms (SNPs) [3]. Since the introduction 
of next-generation DNA sequencing (NGS) technol- 
ogy, SNPs have come to dominate molecular 
genetic studies [2, 4—6]. Recent developments have 
demonstrated that SNPs do not capture all the 
meaningful genomic variations that contribute to 
phenotypic differences [7] and that larger structural 
variants (SVs) also play an important role. SVs are 
defined as genomic variations that involve segments 
of DNA larger than 1 kb in length [8] . SVs refer to 
insertions/deletions (InDels), inversions, translocations 
and copy number variations (CNVs) [8]. SVs can 
also be classified as microscopic or submicroscopic 
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depending on the method of their detection. The 
mechanism of SV formation has been an active area 
of research. Human studies revealed two main 
mechanisms of SV formation, which rely on 
sequence similarity at DNA breakpoints. The first 
mechanism is known as nonhomologous end- 
joining (NHEJ) and requires a very low level of 
sequence similarity at the breakpoints. NHEJ is the 
result of aberrant repair of uneven double-stranded 
breaks produced following DNA damage [9, 10]. 
A second mechanism proposed for repetitive 
sequences in the genome is termed non-allelic 
homologous recombination and this requires high 
sequence similarity at the breakpoints [11, 12]. 
Plant genomes host large numbers of repetitive se- 
quences ranging from 10% in Arabidopsis to >80% in 
bread wheat (Tnticum aestivum), and many plants con- 
tain multiple copies of entire chromosomes in the 
form of ploidy levels (from diploid to octaploid and 
higher) that arise from spontaneous genome duplica- 
tion (autopolyploidy) or hybridization of chromo- 
somes from different species (allopolyploidy). In 
addition to recent genome duplications, there is sub- 
stantial evidence of ancient duplication events in 
various evolutionary lineages (paleopolyploidy). SVs 
can arise through duplication events, with differential 
loss of genes between lineages. In addition, trans- 
posons can play important roles in genome evolution 
and may also generate SVs. Several other mechan- 
isms for SV production have also been proposed, 
such as fork stalling and template switching 
(FoSTeS) [13]. 

In human genetics, SVs have been extensively 
studied for their association with chronic disease 
[14]. However, in plants, studies of SVs are more 
limited. In the 10 years since the sequencing of 
the Arabidopsis genome, the genomes of several 
plant species have become available [15], and the 
cost of sequencing or re-sequencing genomes has 
reduced significantly, enabling the high-throughput 
genome-wide analysis of variants such as SNPs and 
SVs. Recently, SVs have been identified in several 
plant species, including Arabidopsis [16], barley 
(Hordeum vulgare) [17, 18], foxtail millet (Setaria italica) 
[19], maize (Zeamays) [7, 20, 21], rice (Oryza sativa) 
[22], sorghum (Sorghum bicolor) [23], soybean (Glycine 
max) [24] and wheat (77 aestivum) [25], and in several 
cases, SVs were found to be associated with pheno- 
typic variation (Table 1). In this review we focus on 
submicroscopic SVs and present methods for their 
identification and characterization. In addition, we 



provide a brief account of current research into 
microscopic SVs. 

TYPES OF SVs 
Microscopic SVs 

After defining chromosomes as the carrier of the 
genes in the early 20th century, a number of karyo- 
type studies were conducted to determine the size 
and number of chromosomes in different species. 
Features could be visualized directly on chromo- 
somes through a microscope using cytogenetic tech- 
niques such as chromosome painting or fluorescent 
in situ hybridization (FISH). The earliest unhanded 
karyotypes consisted of relatively short condensed 
chromosomes that were barely distinguishable from 
one another. However, changes in chromosome 
numbers and highly abnormal chromosomes could 
be distinguished. Later, solid-stained chromosomes 
were used to detect secondary constrictions, satel- 
lite-regions and size variations in heterochromatic 
regions [42]. By using chromosome-banding tech- 
niques, more discrete structural variations could be 
identified in plant genomes. An alternative strategy, 
FISH, allows the positioning of unique sequences 
and repetitive DNA on chromosomes. At this reso- 
lution, common variations such as changes in length 
or inversions of the pericentric heterochromatic 
region of chromosomes could be identified. 

Genomic in situ hybridization was the first tech- 
nique that used fluorescent labels for analysing 
genome organization in interspecific hybrids, allopo- 
lyploid species and interspecific introgression lines 
[43]. FISH, together with chromosomal arm ratio 
and the mapping of heterochromatic regions was 
conducted for inbred lines of maize and lily (Lilium 
spp.) [44, 45]. In several plant species, large cloned 
genomic regions maintained as bacterial artificial 
chromosome (BACs) have also been successfully 
used as FISH probes to determine the chromosomal 
location of specific sequences [46, 47]. Recently, 
FISH has been used to survey CNVs using 18 ran- 
domly selected potato (Solatium tuberosum) BAC 
clones in 16 potato cultivars with diverse genetic 
backgrounds. Six BACs with insert sizes of 137— 
145 kb were found to be associated with large 
CNVs. Four genes affected by CNVs displayed a 
dosage effect in transcription and were probably 
affecting the growth and development of the 
potato plants [36]. FISH screening using subtracted 
random polymerase chain reaction (PCR) libraries as 
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probes also provided the positions of microsatellite 
and chromosome -specific subtelomeric sequences 
[48]. Cytogenetically detectable hetero chromatic 
variants have been used for species distinction and 
relationship studies in plants [49, 50]. These initial 
studies have provided knowledge of genome size 
variation that demonstrated the relatively consistent 
nature of genomes within a species. However, 
microscopic variations could be found even among 
closely related species, and these might be correlated 
with various adaptive features at the nuclear and 
organismic levels in plants. Microscopic variations 
in some genera occur in a discontinuous manner, 
forming groups of taxa, which are separated by regu- 
lar time intervals. However, some genera showed 
continuous variation [49]. These facts demonstrated 
that microscopic genome variations could be used as 
corroborative evidence in plant systematics. 

Submicroscopic SVs 

Recent advances in DNA sequencing technology 
have allowed plant structural genetic variations to be 
analysed at a higher resolution than the microscopic 
studies described above. These SVs have been identi- 
fied in either a genome-wide or a targeted manner, 
with varying degrees of resolution. Relatively little is 
known about genomic SVs and their association with 
phenotypic characteristics in plants. However, reports 
on such variants have started to appear (Table 1). Here 
we review recent SV studies in plant genomes. 

Copy number variations 

The term CNV is used to define sequences that 
demonstrate a variable copy number between indi- 
viduals. The term has been used to describe duplica- 
tions, deletions and insertions [51]. CNVs have been 
extensively characterized in maize [7]. In this study, 
genome-wide comparison of two inbred lines B73 
and Mol7, identified 400 putative CNVs, and these 
CNVs were reported to be the result of tandem du- 
plications [7]. In a subsequent study, genome- wide 
comparison of a set of 14 inbred maize lines identi- 
fied thousands of CNVs [20]. In a further study in 
maize, CNVs were examined in 19 diverse inbred 
maize lines and 14 teosinte accessions [21]. This 
identified 479 genes with higher copy number and 
3410 genes with fewer copies following comparison 
with a reference genome. Most of these CNVs were 
found to be present in related wild individuals, sug- 
gesting that these CNVs were not associated with 



deleterious genes responsible for lethality or major 
fitness loss [21]. 

In the small genome model plant Arahidopsis, 
CNVs were detected in 402 genes [16], while in 
rice, a comparison of japonica and indica cultivars 
identified 641 CNVs [37]. The majority of these 
rice CNVs suggested a loss of genomic segments in 
the indica cultivar 'Guang-lu-ai 4'. Japonica and 
indica rice diverged around 0.2—0.4 million years 
ago and display a high degree of DNA sequence 
variation [52]. Genome-wide patterns of CNVs 
have also been detected in sorghum by comparing 
two sweet and one grain inbred sorghum lines, iden- 
tifying 3234 CNVs in 2600 genes [23]. Soybean was 
the first legume species to have its genome analysed 
for CNVs, and a total of 267 CNVs with an average 
size of 1 8—23 kb were detected across the genomes 
assayed [24] (Table 1). 

The relationship between CNV occurrence and 
recombination frequency is not fully understood. In 
general, CNVs are scattered across plant genomes. 
Studies conducted in the maize genome have 
revealed that low-recombination regions such as telo- 
meres show a greater number of CNVs [20, 21]. In 
contrast to maize, higher levels of CNV were 
identified in high-recombination regions in soybean 
and barley [18, 24]. 

Presence and absence variations 

Sequences that are present in one genome and absent 
in another genome have been temied presence— ab- 
sence variation (PAV). PAVs can be considered to be 
extreme CNVs, where the sequence is completely 
missing from one or more individual. A comparison 
of sequence data from two maize inbred lines (B73 and 
Mol7) detected 1783 PAVs that were present in the 
B73 genome and absent in the Mol7 genome. These 
PAVs relate to 1270 genes, suggesting that PAV affects 
a significant portion of maize genome. Analysis of these 
PAVs highlighted their association with ancestral evo- 
lution events and domestication [7]. Initially, CNVs 
and PAVs were combined for analysis of genome- 
wide variation in maize [21]. However, the mechanism 
of PAV formation was found to be different from that 
for CNVs and is not influenced by recombination. It 
was found that a short deletion mechanism that is based 
on short direct repeats likely contributes to the high 
rate of PAV among maize genotypes [53] . Comparing 
sequence data from sweet sorghum and grain 
sorghum lines identified 1 6 487 PAVs associated with 
1416 genes. In pigeonpea (Cajanas cajan), PAVs have 
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been reported in the mitochondrial genomes of male- 
sterile (A-), maintainer (B-), hybrid (H-) and wild (W-) 
lines of pigeonpea [35]. Similar mitochondrial 
structural variations have been identified in other 
plant species including maize [54] and Arabidopsis [55]. 

Other structural variations 

Other types of submicroscopic structural variation in- 
clude inversions and translocations. These variations 
have been reported in nuclear and organelle genomes 
and are of considerable interest, as they can introduce 
novel diversity in plants. Several studies have reported 
the presence of subgenomic structural variations in 
mitochondrial genomes that have arisen from inversions 
and translocations [56, 57]. While such events in plant 
mitochondria increase organelle genome complexity, 
recombination has also been found to maintain genomic 
stability and may provide a mechanism to increase gen- 
etic variation in the absence of sexual reproduction [58]. 
Genomic inversions can be a driver of speciation, and 
this has been studied in plants using comparative gen- 
omics [59, 60]. An inverted region may not successfully 
recombine with its counterpart chromosome and might 
lead to infertility. Inversions are highly polymorphic in 
some species and may play a critical role in local adap- 
tation [61]. Large-scale inversions have also been char- 
acterized in the chloroplast genomes of land plants [62]. 
Cytological studies have previously been conducted to 
characterize genomic inversions in various plant species; 
however, the application of large-scale genome sequen- 
cing will significandy help in characterizing the complex 
landscape of inversions and translocations in plant 
genomes. 

APPROACHES TO IDENTIFY 
SUBMICROSCOPIC STRUCTURAL 
VARIATIONS 

The on-going revolution in DNA sequencing technol- 
ogy known as NGS together with advances in bioinfor- 
matics have allowed structural genetic variations to 
be analysed at high resolution at a genome-wide level 
[63, 64]. SVs differ in size and complexity and hence 
different techniques have been used to characterize 
them in plant genomes. PCR-based approaches have 
been used for targeted regions of the genome. For ex- 
ample, real-time quantitative PCR (qPCR) was used to 
detect multiple copies of Botl gene in barley genotypes 
[17], MATE1 gene in maize genotypes [32] and a de- 
letion in the upstream region of Ppd-1 homeologs of 
wheat [25]. This technique offers a high sensitivity and 



a high-throughput alternative to the more traditional 
Southern blot used for detemiining gene copy 
number. PCR can also identify small translocations 
and inversions, as well as InDel polymorphism and 
CNVs [65]. Below we discuss approaches that have 
had a major impact on the discoveries of submicroscopic 
variants in the plant genome. 

Microarrays 

Microarray-based techniques were among the first 
used to detect genome-wide variation in human and 
plant genomes. Using array comparative genomic hy- 
bridization (aCGH), differentially labelled DNA from 
the test genome and a reference genome are hybridized 
to an array. Such an array contains thousands of probes 
developed from known gene sequences. BACs are the 
most popular arrayed targets in aCGH experiments, as 
they provide extensive coverage of the genome; how- 
ever, cDNAs, PCR products and oligonucleotides can 
all be used as array targets. To increase the resolution of 
aCGH, the 'complexity' of the input DNA is reduced 
by a method called representation or whole-genome 
sampling [66]. A number of variations have been 
included in this approach to improve its efficiency, 
for instance using spotted oligonucleotides on 
Affymetrix arrays [67]. 

aCGH was first developed and applied for cancer 
genomics [14], and later used extensively in plant 
genomics to detect SVs [7, 16, 21, 24]. An early 
version of an array used in maize was composed of 
14 423 BACs [7]. In comparison, the latest maize 
array contains 32 450 maize genes [21]. In 
Arabidopsis, a whole-genome CGH array was used 
to estimate SVs [16], and a recently developed 
high-resolution CGH platform was used to investi- 
gate the structure and diversity of genomic introgres- 
sions in two classical soybean near isogenic line 
populations [68]. Several factors affect aCGH-based 
SV detection. Gene distribution along the genome 
captured in arrays is not uniform, leading to bias; the 
majority of the probes are often designed to be com- 
plementary to a single genotype, reducing the effi- 
ciency of detecting SVs in other genotypes; 
sequences that are present in individuals and not in 
the reference sequence from which CGH arrays de- 
signed would not be represented; hybridization sig- 
nals may deviate owing to DNA polymorphisms and 
lead to the false calling of SVs; and finally there 
remains a need to physically map the location of 
the probe in genome. A further challenge is applying 
moderate density arrays to highly repetitive plant 
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genomes. In this scenario, a high-density microarray 
platform designed for aCGH would greatly improve 
the efficiency of detection and estimation of SVs. 

Evolving NGS techniques offer several advantages 
over aCGH by enabling the direct detection of DNA 
variations and recombination breakpoints [69]. 
NGS-based approaches also provide ability to 
detect inversions and translocations that are not gen- 
erally detected by aCGH. However, aCGH would 
still be beneficial in genomic regions with multiple 
repeats where NGS-based assembly is difficult. 

Genome sequencing/re-sequencing 

In recent years, sequencing technologies have rapidly 
evolved from classical Sanger sequencing to NGS 
[70]. This has significantly lowered the cost of 
sequencing DNA. However, there are some limita- 
tions associated with these technologies such as the 
length of a DNA molecule that can be sequenced, 
though there are continuous improvements in this 
area. At present read lengths produced by the various 
technologies range from 25 bp to 15 kb. There is 
usually a compromise between read length, cost 
and accuracy, with low cost or longer read sequen- 
cing generally demonstrating significantly lower 
accuracy than some of the more popular 
technologies. The Illumina sequencing systems 
currently dominate the NGS market and they 
produce accurate reads of 150 bp for the 
HiSeq2500 and 300 bp for the MiSeq. Many NGS 
technologies such as those from Illumina use paired 
end or mate pair sequencing protocols, where two 
reads are generated with a known orientation and 
approximate distance between them. This significant 
assists the specificity of mapping or assembling this 
sequence data. Evolving technologies such as Single- 
Molecule Real Time (SMART) sequencing from 
Pacific Biosciences and Moleculo technology from 
Illumina have demonstrated the ability in reading 
long molecules of DNA up to lOkb to 20 kb [71]. 
Nanopore technology also promises advances in this 
area, though little is known about the specific appli- 
cations. Advances in DNA sequencing technology 
will continue to drive genomics and enhance the 
ability to detect structural variations with increasing 
resolution over a greater number of samples. There 
are three main approaches that can be used for the 
detection of SVs in plant genomes using DNA se- 
quence data: (i) de novo assembly, (ii) re-sequencing 
approach and (hi) pan-genome. 



i) The de novo assembly approach: In this approach 
two or more unique assemblies can be compared to 
identify and characterize SVs. Once the assemblies 
have been generated, this is a very efficient approach 
and can detect all types of SVs including CNVs, 
PAVs, translocations and inversions (Figure 1). The 
initial assembly needs high sequence coverage and 
sophisticated algorithms to reconstruct the genome 
from short overlapping sequences [72, 73]. This ap- 
proach is the most robust for the characterization of 
SVs in a genome; however, the production of de novo 
assembled genomes of suitable quality remains the 
chief limitation. Draft plant genome assemblies are 
often highly fragmented and may contain many col- 
lapsed repeat regions that confound CNV detection. 
Improving and validating genome assemblies is an 
active research area, which is advancing through 
the application of novel algorithms and improved 
DNA sequence data. However, until the sequencing 
cost reduces significantly with substantially longer 
reads the denovo assembly of all genotypes represent- 
ing a species is unfeasible and this approach is usually 
restricted to the detection of inter-species variation. 
Different draft genome assemblies from various plant 
species have been used to detect lineage and trans- 
locations and inversions [59, 60, 74]. 

ii) The re-sequencing approach: In the re-sequen- 
cing approach, DNA sequence reads from individual 
genotypes are aligned to a closely related reference 
genome (Figure 1). Differences between genomes 
then correlate to variations between the aligned 
reads and the reference genome. This approach can 
also be used for the detection of inversions, based on 
the orientation of aligned reads with the reference 
genome. Although this approach may not have such 
a high resolution as the de novo assembly approach, it 
will remain, in our opinion, the preferred method to 
detect intra-specific variation owing to its relatively 
low cost and lack of complexity associated with the 
generation of a de novo genome assembly for each 
variety. The re-sequencing approach has been used 
in sorghum, where a set of nearly 1500 genes differ- 
entiating sweet and grain sorghum were identified 
harbouring SVs [23]. Re-sequencing-based 
approaches are currently being applied to detect 
SVs in several other projects including the 1001 
genome project in Arabidopsis [75], the maize 
panzea project (http://www.panzea.org) and the 
rice variation catalogue [22]. We are currently 
using this approach in pigeonpea, chickpea 
(Cicer arietenum) and peanut (Arachis hypogaea), 
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Figure I: Two major NGS approaches to detect SVs are de novo assembly and re-sequencing. De novo assembly 
method is highly efficient to detect all types of SVs including CNVs, PAVs, inversions and translocations. Re-sequen- 
cing approaches are viable options to detect CNVs and PAVs. 



re-sequencing 300 lines from reference sets for each 
species. These on-going efforts in a variety of plant 
species will provide insight into the distribution of 
SVs in plants as well as their evolution. 

Hi) The pan-genome: The pan-genome is com- 
posed of a core genome and a dispensable genome. 
The core genome contains genome segments or genes 
that are present in all accessions, while a dispensable 
genome is composed of partially shared and accession- 
specific DNA sequence elements. This concept of 
separate core and dispensable genomes was first 
described in prokaryotes [76]. A single genome se- 
quence does not possess the entire genomic architec- 
ture of a species and so a pan-genome approach enables 
the description of a species rather than an individual at 
the genome level. Multiple accession sequencing pro- 
jects in several plant species enables the creation of pan- 
genomes by defining the core and dispensable genome 
components of a species. The pan-genome has been 
described in some plants, e.g. maize [77—79] and 
Arabidopsis thatiana [80, 81]. 



ASSOCIATION OF SVs WITH PLANT 
PHENOTYPES 

The role of SVs has been found to be important in 
human evolution and disease [13, 21], and SVs have 
been shown to be more frequent than SNPs in 
human genomes [13]. Although SVs have also 
been discovered in plants, their discovery and char- 
acterization are heavily reliant on the availability of at 
least one reference genome [82]. Few studies have 
been conducted to characterize the role of SVs in 
shaping plant phenotypes. The role of PAVs in 
determining plant phenotype has been demonstrated 
in opium (Papaver somniferum) , where a cluster of 10 
genes spanning a 221 kb genomic region were found 
to be associated with noscapine synthesis. Analysis of 
an F 2 mapping population indicated that these genes 
are tightly linked and absent in non-noscapine- 
producing lines [34] . Many of the CNVs identified 
in maize were found to be associated with domesti- 
cation [21, 30]. The effect of selection on maize 
diversity has been estimated by sequencing 278 
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temperate maize inbred lines from different stages of 
breeding history. The results demonstrated that 
modern breeding has introduced highly dynamic 
genetic variations in the form of SNPs, InDels 
and CNVs, and affected a number of genie and 
non-genic regions in the maize genome [33]. The 
first-generation maize HapMap was constructed 
using sequence polymorphisms between 27 diverse 
inbred lines. This identified 18 regions that have 
undergone selective sweeps, including one region 
of 11 Mb on the long arm of chromosome 10 [83]. 
The second-generation maize HapMap was con- 
structed using 103 lines and identified SVs that are 
enriched at loci associated with important traits [30]. 
An RNA-seq experiment using diverse lines of 
maize detected 757 loci that were restricted to a 
subset of the lines. Using de novo assembly of un- 
mapped reads, novel transcripts were identified. It 
was also demonstrated that PAVs observed between 
different heterotic groups were transcribed. 
Furthermore, a core set and dispensable set of 
genes were identified [84]. Similarly Lai et al, [31] 
re-sequenced six elite maize inbred lines, including 
the parents of the commercial hybrids, and found 
296 genes in B73 that were missing from at least 
one of the six inbred lines. Inbred lines representing 
different heterotic groups contained different 
sets of deleted genes. In both RNA-seq [84] and 
re-sequencing [31] studies it was postulated that 
unique transcripts or genes present in different het- 
erotic groups might be contributing to the genetic 
basis of heterosis. In a recent study in maize by 
Maron et al. [32], CNVs were identified for the 
MATE! gene in aluminium-tolerant lines, but 
these were not common in teosinte. This study sug- 
gested that multiple copies of the MATE! gene arose 
recently and probably after domestication, and that 
CNVs were selected for their association with 
aluminium tolerance. MATE1 expression found to 
be associated with CNV, where three MATE! 
copies were identical and part of a tandem triplica- 
tion. Only three maize-inbred lines carrying the 
three-copy allele and demonstrating higher alumin- 
ium tolerance were identified from maize and 
teosinte diversity panels [32]. 

CNV of a 31 kb repeat segment observed in dif- 
ferent haplotypes of the Rhgl locus encode multiple 
gene products in soybean cyst nematode (SCN)- 
resistant varieties. In SCN-susceptible varieties, one 
copy of the 31 kb segment per haploid genome was 
present. SCN resistance was found to be associated 



with increased expression of the CNV-related genes 
[85]. In an interesting study in palmer amaranth 
(Amaranthus palmeri) , some plants were found resistant 
to herbicide glyphosate. These resistant plants con- 
tained 5—160 copies more of the EPSPS gene than 
susceptible plants. Expression and protein level of 
EPSPS gene was positively correlated with enhanced 
copy number [86]. 

In wheat, the recent association of SVs with plant 
phenotype has come in form of CNVs and large 
InDel polymorphisms. CNV for the gene Vrn-Al is 
associated with intermediate or late flowering 
phenotypes. CNV of Ppd-Bl is found to contribute 
to photoperiod sensitivity in wheat [40]. Genotypes 
with a single copy of the Ppd-Bl gene were photo- 
period sensitive, while genotypes with elevated copy 
numbers were found to be early flowering and day- 
neutral [40]. An InDel polymorphism in the 50 bp 
upstream region of the Ppd-1 gene was also associated 
with heading time of wheat cultivars [25]. In barley, 
a CACTA-like transposon insertion 5 kb upstream of 
the Open Reading Frame (ORF) of the aluminium 
tolerance gene HcAACTl enhances and alters the 
tissue localization of HcAACTl expression [29]. 
Another example of trait-associated CNVs in 
barley is the boron efflux carrier gene Botl that 
plays an important role in boron tolerance [17]. 
CNVs have been found to be associated with nu- 
cleotide -binding leucine-rich repeat (NB-LRR) 
genes and receptor-like kinase (RLK) genes, 
known to be involved in plant defence-related 
mechanisms. CNVs related to disease resistance and 
biotic stress responses have also been identified in 
Arabidopsis [27], rice [22] and soybean [24]. Variable 
copies of these genes may be advantageous in the 
face of changing environmental conditions and pos- 
sible threats posed by continuously evolving pest and 
pathogens. 



OUTLOOK 

Results from plant genome analysis have demon- 
strated the importance of SVs in evolutionary and 
biological processes. Initial studies conducted in a 
limited number of plant species suggest that a range 
of SVs are present and distributed across the gen- 
omes. It is anticipated that SVs will contribute an 
equal amount to the overall variation observed in 
the genome as SNPs. The low level of sequence 
diversity that is often suggested to exist in some of 
the self-pollinated or partially cross-pollinated crop 



Structural variations in plant genomes 



305 



species might therefore be considered to be an over- 
estimate. There remain challenges that need to be 
resolved before we achieve a complete understand- 
ing of the genome and its relationship with the plant 
phenotype. These include the effect of combinations 
of variants, interactions between genetic and envir- 
onmental factors and epigenetic mechanisms. At pre- 
sent, no single method has the capability to detect 
the total complement of genomic structural vari- 
ations. Even genome re-sequencing that is being 
applied in a number of important plant species 
would resolve only a proportion of the structural 
variation present in the genome. The highest reso- 
lution studies of SVs can be achieved by using a de 
novo assembly-based approach; however, this is not 
currently feasible for large numbers of individuals. 
Further, continuous improvements in sequencing 
technologies and reduction in costs will make it pos- 
sible to detect nearly all variants between genomes. 
Even after de novo assembly, a significant amount of 
information could be lost owing to the challenges of 
assembling SVs using the available algorithms, and 
major advances in sequencing technology are 
required to facilitate accurate whole-genome assem- 
bly on a large scale. Improved assembly algorithms, 
combined with the ability to accurately sequence 
long stretches of DNA, would be beneficial to over- 
come many of these limitations. On-going and 
future efforts would greatly facilitate studies aimed 
at correlating genetic variations with plant perform- 
ance. These efforts will also provide better under- 
standing of the nature of the population history, 
natural selection and impact of structural variation 
in the plant genomes. 
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